Learning Mappings between Data Schemas

نویسندگان

  • AnHai Doan
  • Pedro Domingos
  • Alon Y. Levy
چکیده

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both laborintensive and error-prone, and has proven to be a major bottleneck in deploying large-scale data integration systems in practice. In this paper we report on our initial work toward automatically learning mappings between source schemas and the mediated schema. Specifically, we investigate finding one-to-one mappings for the leaf elements of source schemas. We describe LSD, a system that automatically finds such mappings. LSD consults a set of learner modules – where each module looks at the problem from a different perspective, then combines the predictions of the modules using a metalearner. Learner modules draw knowledge from the World-Wide Web, as well as on ideas from machine learning and information retrieval. We report on experimental results of applying LSD to five sources in the real-estate domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Integration: A “Killer App” for Multistrategy Learning

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both laborintensive and error-prone, ...

متن کامل

Learning Source Descriptions for Data Integration

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...

متن کامل

Reuse of Schema Mappings for Data Transformation Design

The definition of data transformations between heterogeneous schemas is a critical activity of any database application. Currently, automated tools provide high level interfaces for the discovery of correspondences between elements of schemas, but transformations (i.e., schema mappings) need to be manually specified every time from scratch, even if the problem at hand is similar to one that has...

متن کامل

Learning Source Description for Data Integration

To build a data-integration system, the application designer must specify a mediated schema and supply the descriptions of data sources. A source description contains a source schema that describes the content of the source, and a mapping between the corresponding elements of the source schema and the mediated schema. Manually constructing these mappings is both labor-intensive and error-prone,...

متن کامل

Composing Mappings Between Schemas Using a Reference Ontology

Large-scale database integration requires a significant cost in developing a global schema and finding mappings between the global and local schemas. Developing the global schema requires matching and merging the concepts in the data sources and is a bottleneck in the process. In this paper we propose a strategy for computing the mapping between schemas by performing a composition of the mappin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000